Identification of Minimum Redundancy Tagging SNPs via Gibbs Sampling
نویسنده
چکیده
Single nucleotide polymorphisms (SNPs) are genetic changes that can occur within a DNA sequence. Due to the high frequency of SNPs in the human genome, it is desirable to select a small set of SNPs (tagging SNPs) that can be used to represent the majority of SNPs. We propose a Gibbs sampling approach to find a small set of SNPs with minimum redundancy for tagging purposes. Preclustering is added in the basic Gibbs sampling procedure to avoid the disturbance caused by local optima. We also propose two general purpose correlation measures that are able to accommodate SNPs with three or more alleles. Our experimental results show that Gibbs sampling process converges faster and finds better optimum if pre-clustering is conducted before the sampling process. While our tagging process is not guided by any prediction algorithm, we are able to obtain comparable results as the SNP prediction guided algorithm SVM/STSA [1] while requiring much less time.
منابع مشابه
Two-Stage sampling designs for gene association studies.
We consider two-stage case-control designs for testing associations between single nucleotide polymorphisms (SNPs) and disease, in which a subsample of subjects is used to select a panel of "tagging" SNPs that will be considered in the main study. We propose a pseudolikelihood [Pepe and Flemming, 1991: JASA 86:108-113] that combines the information from both the main study and the substudy to t...
متن کاملSimple, Correct Parallelization for Blocked Gibbs Sampling
We present a method for distributing collapsed Gibbs sampling over multiple processors that is simple, statistically correct, and memory efficient. The method uses blocked sampling, dividing the training data into relatively large sized blocks, and distributing the sampling of each block over multiple processors. At the end of each parallel run, MetropolisHastings rejection sampling is performe...
متن کاملEstimation of Linear Systems using a Gibbs Sampler
This paper considers a Bayesian approach to linear system identification. One motivation is the advantage of the minimum mean square error of the associated conditional mean estimate. A further motivation is the error quantifications afforded by the posterior density which are not reliant on asymptotic in data length derivations. To compute these posterior quantities, this paper derives and ill...
متن کاملUsing Prior Probabilities and Density Estimation for Relational Classification
A Bayesian method for incorporating probabilistic background knowledge into ILP is presented. Positive only learning is extended to allow density estimation. Estimated densities and deened prior are combined in Bayes theorem to perform relational classiication. An initial application of the technique is made to part-of-speech (POS) tagging. A novel use of Gibbs sampling for POS tagging is given.
متن کاملBayesian Inference of (Co) Variance Components and Genetic Parameters for Economic Traits in Iranian Holsteins via Gibbs Sampling
The aim of this study was using Bayesian approach via Gibbs sampling (GS) for estimating genetic parameters of production, reproduction and health traits in Iranian Holstein cows. Data consisted of 320666 first- lactation records of Holstein cows from 7696 sires and 260302 dams collected by the animal breeding center of Iran from year 1991 to 2010. (Co) variance components were estimated using ...
متن کامل